Evolutionary histories of breast cancer and related clones

Recent studies have documented frequent evolution of clones carrying common cancer mutations in apparently normal tissues, which are implicated in cancer development1–3. However, our knowledge is still missing with regard to what additional driver events take place in what order, before one or more of these clones in normal tissues ultimately evolve to cancer. Here, using phylogenetic analyses of multiple microdissected samples from both cancer and non-cancer lesions, we show unique evolutionary histories of breast cancers harbouring der(1;16), a common driver alteration found in roughly 20% of breast cancers. The approximate timing of early evolutionary events was estimated from the mutation rate measured in normal epithelial cells. In der(1;16)(+) cancers, the derivative chromosome was acquired from early puberty to late adolescence, followed by the emergence of a common ancestor by the patient’s early 30s, from which both cancer and non-cancer clones evolved. Replacing the pre-existing mammary epithelium in the following years, these clones occupied a large area within the premenopausal breast tissues by the time of cancer diagnosis. Evolution of multiple independent cancer founders from the non-cancer ancestors was common, contributing to intratumour heterogeneity. The number of driver events did not correlate with histology, suggesting the role of local microenvironments and/or epigenetic driver events. A similar evolutionary pattern was also observed in another case evolving from an AKT1-mutated founder. Taken together, our findings provide new insight into how breast cancer evolves.


Supplementary Note 1. Estimation of sensitivity of WGS mutation calling
For single cell-derived organoids (n=71), the sensitivity of WGS mutation calling was estimated by calculating the fraction of unique heterozygous germline polymorphisms in each sample detected via paired analysis using another participant's germline sample to mimic a matched control (Supplementary Fig. 1). Regarding FFPE and fresh-frozen LCM samples, sequencing reads of a LCM sample was merged with those of another participant' s germline sample at different mixing rates to mimic the tumour samples containing 10-100% of tumour cells with the target coverage; the sensitivity was estimated using polymorphisms on the chromosomal segments without a copy number change in five and three samples for FFPE and fresh-frozen LCM samples, respectively ( Supplementary Figs. 2,3). The  Average of estimated sensitivity of WGS mutation calling for each mixing rate in FFPE LCM samples (n=5), with error bars indicating ± standard deviation. Sensitivity was estimated for four different calling algorithms; '3-caller combination' is the method used in our entire analysis, wherein the mutations called via ≥2 of 3 callers (Genomon2, Mutect2, and Strelka2) were considered 'high confidence' mutation calls.

Supplementary Figure 3: Estimated sensitivity of WGS mutation calling in fresh-frozen LCM samples
Supplementary Figure 3 legend: Average of estimated sensitivity of WGS mutation calling for each mixing rate in fresh-frozen LCM samples (n=3), with error bars indicating ± standard deviation. Sensitivity was estimated for four different calling algorithms; '3caller combination' is the method used in our entire analysis, wherein the mutations called via ≥2 of 3 callers (Genomon2, Mutect2, and Strelka2) were considered 'high confidence' mutation calls.

Supplementary Note 2. Validation of the phylogenetic analysis
Phylogenetic trees were reconstructed by combining somatic mutation data and copy number information across all LCM samples using MEGA and 'treemut.' To validate this approach, we reconstructed trees for the representative two cases (KU779 (Fig. 2a) and KU539 (Extended Data Fig. 5a)) using another software, PyClone-VI (v0.1.0, URL: https://github.com/Roth-Lab/pyclone-vi), which estimates clusters of mutations shared by clones or subclones, or 'branch' in the tree, and the samples that share each cluster on the basis of mutant cell fraction, the number of mutant and wild-type reads, and total and minor copy number at the mutation locus for all mutations in all samples. These clusters were then visually ordered so that the samples in a parent branch comprise those in their child branches across all branchpoints. Here, mutation clusters were excluded if the sum of the mutant fraction of parent and child branches did not exceed 1.0 according to the Pigeonhole principle. When clusters showed inconsistency, clusters of fewer mutations were excluded so that the remaining clusters remained consistent.
Because PyClone has a limitation for the number of samples to be analysed accurately, we first focused on the top three branchpoints of the tree reconstructed using MEGA/treemut. We randomly selected a sample if a branch comprised a set of multiple samples, which were analysed separately. As for multiple samples from a single tumour focus, one sample was randomly selected among them.
Topology of the trees was largely concordant (22/25, 88%) between MEGA/treemut-and PyClone-VIreconstructed trees, with a few exceptions at peripheral branches ( Supplementary Fig. 4). The assignment of driver mutations to branches was 100% concordant between them.